Multiple alignment by aligning alignments
نویسندگان
چکیده
MOTIVATION Multiple sequence alignment is a fundamental task in bioinformatics. Current tools typically form an initial alignment by merging subalignments, and then polish this alignment by repeated splitting and merging of subalignments to obtain an improved final alignment. In general this form-and-polish strategy consists of several stages, and a profusion of methods have been tried at every stage. We carefully investigate: (1) how to utilize a new algorithm for aligning alignments that optimally solves the common subproblem of merging subalignments, and (2) what is the best choice of method for each stage to obtain the highest quality alignment. RESULTS We study six stages in the form-and-polish strategy for multiple alignment: parameter choice, distance estimation, merge-tree construction, sequence-pair weighting, alignment merging, and polishing. For each stage, we consider novel approaches as well as standard ones. Interestingly, the greatest gains in alignment quality come from (i) estimating distances by a new approach using normalized alignment costs, and (ii) polishing by a new approach using 3-cuts. Experiments with a parameter-value oracle suggest large gains in quality may be possible through an input-dependent choice of alignment parameters, and we present a promising approach for building such an oracle. Combining the best approaches to each stage yields a new tool we call Opal that on benchmark alignments matches the quality of the top tools, without employing alignment consistency or hydrophobic gap penalties. AVAILABILITY Opal, a multiple alignment tool that implements the best methods in our study, is freely available at http://opal.cs.arizona.edu.
منابع مشابه
Optimal Alignment of Multiple Sequence Alignments
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 CHAPTER 1: INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1 Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملAligning Alignments Exactly ( Extended abstract )
A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a strai...
متن کاملAligning Alignments
While the area of sequence comparison has a rich collection of results on the alignment of two sequences, and even the alignment of multiple sequences, there is little known about the alignment of two alignments. The problem becomes interesting when the alignment objective function counts gaps, as is common when aligning biological sequences, and has the form of the sum-of-pairs objective. We b...
متن کاملLearning Models for Aligning Protein Sequences with Predicted Secondary Structure
Accurately aligning distant protein sequences is notoriously difficult. A recent approach to improving alignment accuracy is to use additional information such as predicted secondary structure. We introduce several new models for scoring alignments of protein sequences with predicted secondary structure, which use the predictions and their confidences to modify both the substitution and gap cos...
متن کاملCOACH: profile-profile alignment of protein families using hidden Markov models
MOTIVATION Alignments of two multiple-sequence alignments, or statistical models of such alignments (profiles), have important applications in computational biology. The increased amount of information in a profile versus a single sequence can lead to more accurate alignments and more sensitive homolog detection in database searches. Several profile-profile alignment methods have been proposed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 23 13 شماره
صفحات -
تاریخ انتشار 2007